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The study of various decision problems for logic fragments has a long his¬ 
tory in computer science. This paper is on the membership problem for a 
fragment of first-order logic over infinite words; the membership problem 
asks for a given language whether it is definable in some fixed fragment. The 
alphabetic topology was introduced as part of an effective characterization of 
the fragment S2 over infinite words. Here, S2 consists of the first-order for¬ 
mulas with two blocks of quantifiers, starting with an existential quantifier. 
Its Boolean closure is BS2. Our first main result is an effective character¬ 
ization of the Boolean closure of the alphabetic topology, that is, given an 
cu-regular language L, it is decidable whether L is a Boolean combination of 
open sets in the alphabetic topology. This is then used for transferring Place 
and Zeitoun’s recent decidability result for BS2 from finite to infinite words. 


1 Introduction 

Over finite words, the connection between finite monoids and regular languages is highly 
successful for studying logic fragments, see e.g. mm- Over infinite words, the algebraic 
approach uses infinite repetitions. Not every logic fragment can express whether some 
definable property P occurs infinitely often. For instance, the usual approach for saying 
that P occurs infinitely often is as follows: for every position x there is a position 
y > X satisfying P{y). Similarly, P occurs only finitely often if there is a position x 
such that all positions y > x satisfy -^P{y). Each of these formulas requires (at least) 
one additional change of quantifiers, which not all fragments can provide. It turns out 
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that topology is a very useful tool for restricting the inhnite behaviour of the algebraic 
approach accordingly, see e.g. m El da Eu. In particular, the combination of algebra 
and topology is convenient for the study of languages in r°°, the set of finite and infinite 
words over the alphabet F. In this paper, a regular language is a regular subset of r°°. 

Topological ideas have a long history in the study of cj-regular languages. The Cantor 
topology is the most famous example in this context. We write G for the Cantor-open 
sets and F for the closed sets. The open sets in G are the languages of the form WT°° 
for VF C r*. If X is a class of languages, then consists of the countable intersections 
of languages in X and X^ are the countable unions; moreover, we write BX for the 
Boolean closure of X. Since F contains the complements of languages in G, we have 
MF = BG. The Borel hierarchy is defined by iterating the operations X Xg and 
X I—7> X„. The Borel hierarchy over the Cantor topology has many appearances in the 
context of ca-regular languages. For instance, an w-regular language is deterministic 
if and only if it is in Gs, see [Hilo]. By McNaughton’s Theorem [9], every w-regular 
language is in B(G 5 ) = B(To-). The inclusion BG C G 5 D iG is strict, but the w-regular 
languages in BG and G 5 fl F„ coincide m- 

deterministic 


open G 


closed F 


BG = BF 



w-regular 



in 


B(G5) = M{F^) 



Let FO^ be the fragment of first-order logic which uses (and reuses) at most k vari¬ 
ables. By Em we denote the formulas with m quantifier blocks, starting with a block 
of existential quantifiers. Here, we assume that x < y is the only binary predicate. We 
frequently identify a fragment with the languages dehnable therein. Let us consider FO^ 
as a toy example. With only one variable, we cannot make use of the binary predicate 
X < y. Therefore, in FO^ we can say nothing but which letters occur, that is, a language 
is definable in FO^ if and only if it is a Boolean combination of languages of the form 
F*aF°° for a € F. Thus FO^ C BG. It is an easy exercise to show that a regular language 
is in FO^ if and only if it is in BG and its syntactic monoid is both idempotent and com¬ 
mutative. The algebraic condition without the topology is too powerful since this would 
also include the language {a,b}*a‘^, which is not in FO^. For the fragment BSi, the 
same topology BG with a different algebraic condition works, cf. |10l Theorems VI.3.7, 
VL7.4 and VIIL4.5]. 

In the fragment E 2 , we can define the language {a, b}* ab°° which is not deterministic 
and hence not in Gs- Since the next level of the Borel hierarchy already contains all 
regular languages, another topology is required. For this purpose, Diekert and the 
hrst author introduced the alphabetic topology [ 1 ]: the open sets in this topology are 
arbitrary unions of languages of the form uA°° for u € F* and H C F. They showed that 


2 



a regular language is definable in S2 if and only if it satisfies some particular algebraic 
property and if it is open in the alphabetic topology. Therefore, the canonical ingredient 
for an effective characterization of IBS2 is the Boolean closure of the open sets in the 
alphabetic topology. Our first main result shows that, for a given regular language L, it 
is decidable whether L is a Boolean combination of open sets in the alphabetic topology. 
As a by-product, we see that every w-regular language which is a Boolean combination of 
arbitrary open sets in the alphabetic topology can be written as a Boolean combination 
of w-regular open sets. This resembles a similar result for the Cantor topology [16j . 

A major breakthrough in the theory of regular languages over finite words is due to 
Place and Zeitoun m- They showed that, for a given regular language L C T*, it 
is decidable whether L is definable in BS2. This solved a longstanding open problem, 
see e.g. m Section 8] for an overview. To date, no effective characterization of BSs 
is known. Our second main result is to show that this decidability result transfers to 
languages in r°°. If V2 is the algebraic counterpart of BS2 over finite words, then 
we show that V2 combined with the Boolean closure of the alphabetic topology yields 
a characterization of BS2 over r°°. Combining the decidability of V2 with our first 
main result, the latter characterization is effective. The proof that BS2 satisfies both 
the algebraic and the topological restrictions follows a rather straightforward approach. 
The main difficulty is to show the converse: every language satisfying both the algebraic 
and the topological conditions is definable in BS2. 

2 Preliminaries 

Words 

Let T be a finite alphabet. By T* we denote the set of finite words over T; we write 1 for 
the empty word. The set of infinite words is T^^ and the set of finite and infinite words is 
poo = p* u By yj donote finite words and by a, / 3 ,7 we denote words in r°°. 
In this paper a language is a subset of r°°. Let ACT* and K C r°°. As usually L* is 
the union of powers of L and LK = {ua \ uGL,aGAr}C is the concatenation of L 
and K. By we denote the set of words which are an infinite concatenation of words in 
A and the infinite concatenation uu - ■ ■ of the word u is written u‘^. A word tt = ai... 
is a (scattered) subword of u if u € r*air* ... OnT*. The alphabet of a word is the set of 
all letters which appear in the word. The imaginary alphabet im(a) of a word a G 
is the set of letters which appear infinitely often in a. Let A'™ = {a € r°° | im(a) = A} 
be the set of words with imaginary alphabet A. In the following we will restrict us to 
the study of regular languages. A language A C T* is regular if it is recognized by a 
(deterministic) finite automaton. A language A' C T*^ is regular if it is recognized by a 
Biichi automaton. A language A C r°° is regular if A n T* and A n T'^ are regular. This 
is equivalent to being recognized by an extended Biichi automaton [ 2 ]. 
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First-Order logic 

We consider first order logic FO over r°°. Variables range over the position of the word. 
The atomic formulas in this logic are T for true, x < y to compare two positions x and 
y and A(x) = a which is true if the word has an a at position x. One may combine 
those atomic formulas with the boolean connectives ^,A and V and quantifiers V and 
3 . A sentence <y9 is a FO formula without free variables. We write a \= (p \i a ^ r°° 
satisfies the sentence p. The language defined by p is L{p) = {a € F°° | a\= p}. We 
will classify the formula of FO by counting the number of quantifier alternations, that is 
the number of alternations of 3 and V. The fragment Sj of FO contains all FO-formula in 
prenex normal form with i blocks of quantifiers 3 or V, starting with a block of existential 
quantors. The fragment BSj contains all Boolean combinations of formulas in Ej. We 
are particularly interested in the fragment E2 and the Boolean combinations of formulas 
in E2. A language L is definable in a fragment J- (e.g. J- is S2 or BS2) if there exists a 
formula p ^ T such that L = L{p), i.e., if L is definable by some p ^ T. The classes of 
languages defined by Ej and BEj form a hierarchy, the quantifier alternation hierarchy. 
This hierarchy is strict, i.e., Ej C BEj C Ej+i holds for all i, cf. [H [Si- 

Monomials 

A monomial is a language of the form AQaiA^a2 • • ■ for n > 0 , a* G F and 

Ai C F. The number n is called the degree. In particular, is a monomial of degree 
0 . A monomial is called fe-monomial if it has degree at most k. In [ 1 ] it is shown that a 
language L C r°° is in E2 if and only if it is a finite union of monomials. We are interested 
in BE2 and thus in finite Boolean combination of monomials AQaiA^a2 • • • 

For this, let =“ be the equivalence relation on F°° such that a =“ /3 if a and j 3 are 
contained in exactly the same A:-monomials. Thus, =^-classes are Boolean combinations 
of monomials and every language in BE2 is a union of =^-classes for some k. Further, 
since there are only finitely many monomials of degree k, there are only finitely many 
=“-classes. The equivalence class of some word a in is denoted by [a]“. Note, 
that such a characterization of BE2 in terms of monomials does not yield a decidable 
characterization. 

Our characterization of languages L C F°° in BE2 is based on the characterization of 
languages in BE2 over finite words. For this, we also introduce monomials over F*. A 
monomial over F* is a language of the form AQaiA|a2 • • • A*_^anA* for n > 1 , Uj € F 
and Ai C F. The degree is dehned as above. Let =k be the congruence on F* which is 
defined by u =k if and only if u and v are contained in the same monomials over F*. 
The equivalence classes are noted by [u]fc. Again, a language L C F* is in BE2 over F* 
if and only if it is a union of =fc-classes for some k, i.e., if L = Uu£L[u]k- 

Algebra 

In this paper all monoids are either finite or free. Finite monoids are a common way for 
defining regular languages. A monoid element e is idempotent if = e. Every element 
X of a hnite monoid admits a unique idempotent x* for some integer i > 1 . An ordered 
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monoid (M, <) is a monoid equipped with a partial order which is compatible with the 
monoid multiplication, i.e., s < t and s' < t' implies ss' < tt'. Every monoid can be 
ordered by using the identity as partial order. For a homomorphism h : {N, <) ^ (M, <) 
between ordered monoids we require s < t ^ h{s) < h{t) for all s,t & N. A divisor is 
the homomorphic image of a submonoid. 

A class of monoids which is closed under division and finite direct products is a pseu¬ 
dovariety. Eilenberg showed a correspondence between certain classes of languages (of 
finite words) and pseudovarieties [ 3 ]. A homomorphism h : {N,<) —(M, <) between 
two ordered monoids must hold s < t ^ h{s) < h{t) for s,t £ N. A pseudovariety of 
ordered monoids is defined defined the same way as with unordered monoids, using the 
homomorphisms of ordered monoids. The Eilenberg correspondence then also holds for 
ordered monoids m- Let V3/2 be the pseudovariety of ordered monoids which corre¬ 
sponds to 112 and V2 be the pseudovariety of monoids which corresponds to languages in 
BS2. Since S2 C BS2, we obtain V3/2 C V2 when ignoring the order. The connection 
between monoids and languages is given by the notion of recognizability. A language 
L C r* is recognized by an ordered monoid (M, <) if there is a monoid homomorphism 
h : T* ^ M such that L = U | s < t for some s G h{L)^. If M is not ordered, 

then this means that L is an arbitrary union of languages of the form h~^{t). 

For (u-languages L C the notion of recognizability is slightly more technical. For 
simplicity, we only consider recognition by non-ordered monoids. Let /i : T* —> M be 
a monoid homomorphism. If the homomorphism h is understood, we write [s] for the 
language h~^{s). We call (s, e) € M x M a linked pair if = e and se = s. By Ramsey’s 
Theorem m for every word a G there exists a linked pair (s, e) such that a G ['S][e]‘^. 
A language L C r°° is recognized by h if 

L = I (s, e) is a linked pair with [s][e]^ fl L 0} . 

Since = 1 , the language [ 1 ]“^ also contains finite words. We thus obtain recognizability 
of languages of finite words as a special case. A language L C r°° is regular if it is 
recognized by (a homomorphism to) a finite monoid. 

Next, we define syntactic homomorphisms and syntactic monoids; as we will see, these 
are the minimal recognizers of a regular language. Let L C r°° be a regular language. 
The syntactic monoid of L is defined as the quotient Synt(L) = T*/ where u v 
holds if and only if for all x,y,z G T* we have both xuyz^ G L xvyz^ and x{uy)'^ G 
L x{vy)‘^ G L. The syntactic monoid can be ordered by the quasiorder defined by 
u <1 V if for all x,y,z G T* we have xuyz^ G L => xvyz‘^ and x{uy)‘^ G L => x{vy)^ G L. 
One can effectively compute the syntactic homomorphism of L. The syntactic monoid 
Synt(L) satisfies the property that L is regular if and only if Synt(L) is finite and the 
canonical homomorphism hi : T* ^ Synt(L) recognizes L, see e.g. [101 120 ] . Every 
pseudovariety is generated by its syntactic monoids [ 3 ], i.e., every monoid in a given 
pseudovariety is a divisor of a direct product of syntactic monoids. The importance 
of the syntactic monoid of some language L C r°° is that it is the smallest monoid 
recognizing L: 
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Lemma 1. Let L C r°° be a language which is recognized by a homomorphism h :T* ^ 
Then, (Synt(L), is a divisor of 

Proof. We assume that h is surjective and show that Synt(L) is a quotient of M. If 
h is not surjective, we can therefore conclude that Synt(L) is a divisor of M. We 
show that h{u) < h{v) => u Pl v. Let u, v be words with h{u) < h{v) and denote 
h~^{h{w)) = [/i(tc)] for words w. Assume xuyz'^ € L, then there exists an index i such 
that {h{xuyz^),h{z)‘^) is a linked pair. Thus, [h{xuyz^)][h{z)]‘^ C L and by h{u) < 
h{v) also [h{xvyz^)][h{z)]‘^ C L. This implies xvyz‘^ € L. The proof that x{uy)‘^ € 
L x{vy)‘^ £ L is similar. Thus, u Pl v holds which shows the claim. □ 

We stated the lemma for ordered monoids also for languages containing infinite words, 
but in the ordered setting it will be applied only for finite words. 

3 Alphabetic Topology 

The topological component is crucial for our approach. As mentioned in the introduction, 
combining algebraic and topological conditions is a successful approach for characteriza¬ 
tions of language classes over r°°. A topology on a set X is given by a family of subsets 
of X (called open) which are closed under finite intersections and arbitrary unions. We 
define the alphabetic topology over r°° by its basis {uA°° \ u £ T*, A C T}. Hence, an 
open set is described as Ua with Wa T T*. The alphabetic topology has been 

introduced in |Tj, where it is used as a part of the characterization of S2 over r°°. 

Theorem 2 ([!]). Let L C r°° he a regular language. Then L £ T12 if o-nd only if 
Synt(L) £ V3/2 and L is open in the alphabetic topology. 

The alphabetic topology has by itself been the subject of further study m- We are 
particularly interested in Boolean combinations of open sets. An effective characteriza¬ 
tion of a language L being a Boolean combination of open sets in the alphabetic topology 
is given in the proposition below. 

Theorem 3. Let L C r°° be a regular language which is recognized by h : T* ^ M . 
Then the following are equivalent: 

1 . L is a Boolean combination of open sets in the alphabetic topology where each open 
set is regular. 

2 . L is a Boolean combination of open sets in the alphabetic topology. 

3 . For all linked pairs (s, e), (t, /) it holds that if there exists an alphabet C and words 
e, / with h{e) = e, h{f) = f, alph(e) = alph(/) = C and s ■ h{C*) = t ■ h{C*), then 
[s][e]‘^ C L [t][f]‘^ C L. 

Proof. ‘ID ^ O’ : This is trivial. 
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Figure 1: Part of the right Cayley graph of M in the proof of ‘[2] ^[3]’. 


‘I2]^[3l’: Let L be a Boolean combination of strict alphabetic open sets. We may 
assume 


L = U f (P.^D \ 



poo 


for some Pi,Qij P F* and alphabets Ai,Bij C F. Assume [s][e]‘^ C L, but ^ 

L. It suffices to show that n L is nonempty. Let ue‘^ G [s][c]^ ^ L for some 

u G [s],e G [e] with alph(e) = C. We also choose some words f,x,y G C* such that 
h{f) = f, s ■ h{x) = t, t ■ h{y) = s and alph(/) = C. 

The idea is to find an increasing sequence of words G [s] and sets C {1,... ,n} 
such that uiC°° fl ^PjA“ \ ~ ® fo'^ ^ uq = u and 

Iq = 0. Consider the word G L. There exists an index i £ {1, ... ,n} \ Ii such that 
Uie‘^ G PiA'f \ Choose k big enough, such that in the decomposition 

uie^&^ the part U£e^ overlaps into the Af° part. Since C = alph(e) C Ai, we also 
have /?£ = UiC^xf'^ G PiAf D A™. By construction we have fii G and therefore, 

assuming [t][/]‘^nL = 0, there exists an index j such that /?£ G Analogous, there 

exists a k' such that U£e^xf^'yC°° C QijB°°. Hence we can choose = Uii^xf^'y 
and l£+i = IiA {i}. 

Since ui[e]‘^ C L n UiC°°, this construction has to fail at an index i < n. Therefore, 
the assumption is not justified and we have [t][/]‘^ fl L ^ 0, proving the claim. 

‘I3]^[T]’: Let a G [s][e]‘^ C L for a linked pair (s,e). Let C = im(a). By a G [s][e]^ 
and the dehnition of C there exists an e G C* with alph(e) = C and /i(e) = e. Dehne 


L' := L(s,C) := [s]C°° \ |J F*P“ U |J [t]C^ 

\D£C s0-h{C*) 

We have a £ L' and L' is a Boolean combination of open sets in the alphabetic topology 
whereas each open set is regular. Since there are only finitely many sets of the type 
L(s,C'), it suffices to show L' C L. For C = 0 we have L' = [s] and hence L' C L. 
Thus, we may assume C 7 ^ 0. Let /3 G L' be an arbitrary element and let /3 G [t][/]‘^ 
for a linked pair (t, /). Since /3 is in L', it admits a decomposition (3 = vf3 with v G [s] 
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and ^ . Also, by /3 € one gets (3 = vf3' with v G [t],f3' G [/]‘^. Using 

tf = t and C 7 ^ 0, we may assume that |u| > Ihl, which implies (3' G . Hence we have 
t G s ■ h{C*). By construction (3 0 Us0i./i(c*) therefore s Gt ■ h{C*). It follows 

s ■ h{C*) = t ■ h{C*). Since (3 0 have alph(/3') = C. Using [3] it 

follows (3 G L. □ 

The alphabetic topology above is a refinement of the well-known Cantor topology. 
The Cantor topology is given by the basis mT”® for w G T*. A regular language L is a 
Boolean combination of open sets in the Cantor topology if and only if [s][e]‘^ C L 
[^][/]‘^ U L for all linked pairs (s, e) and (f, /) of the syntactical monoid of L with s TZt, 
c.f. [mniEo]. Theorem [3] is a similar result, but one had to consider the alphabetic 
information of the linked pairs. Hence, one does not have s TZt as condition, but rather 
7?.-equivalence within a certain alphabet C. 

Remark 4. The strict alphabetic topology over r°°, which is introduced in [3], is given by 
the basis {^uA°° n A'™ | u G T*, A C T} and the open sets are of the form (J^ WaA°° n 
A™ with Wa U r*. Reusing the proof of Theorem [3] it turns out, that it is equivalent 
to be a Boolean combination of open sets in the alphabetic topology and in the strictly 
alphabetic topology. Since ttA°° = IJbca every open set in the alphabetic 

topology is also open in the strict alphabetic topology. Further, one can adapt the proof 
of ‘[2]=>[31’ of Theorem [3] to show that if L is a Boolean combination of open sets in the 
strict alphabetic topology, then item [3] of Theorem [3] holds. 

4 The fragment BE 2 

Place and Zeitoun have shown that BS 2 is decidable over finite words. In particular, 
they have shown that given the syntactic homomorphism of a language L, it is decidable 
if L G BS 2 . Let V 2 be the pseudovariety of monoids which corresponds to the language 
variety of all languages contained in BS 2 . Since every pseudovariety is generated by its 
syntactic monoids, the result of Place and Zeitoun can be stated as follows: 

Theorem 5 f|13jl. The pseudovariety V 2 corresponding to the MT, 2 -definable languages 
in P* is decidable. 

The main part of the proof will be Proposition [71 The following lemma will be an 
auxiliary result for Proposition [71 

Lemma 6. There exists a number I such that for every set {Mi,... ,Md} of k-monomials 
over P* and every w with w G Mi for all i G {1,... ,n}, there exists a l-monomial N 
over P* with w G N and N C nMj. 

Proof. As one can iterate the statement, it suffices to show it for d = 2. Let Mi = 
AgaiAl^a 2 • • • A*_]^a„A* and M 2 = BQbiB{b 2 ■ ■ ■ B^_.^bmB^ be two monomials. Since 
w G Ml and w G M 2 , it admits factorizations w = uoaiUia 2 ■ ■ ■ Un-iOnUn and w = 
vobiVib 2 ■ ■ ■ Vm-ibmVm such that Ui G A* and Vi G B*. The factorizations mark the posi¬ 
tions of the ajS and the bjS and pose an alphabetic conditions for the factors inbetween. 


uo ai ui a2 U2 


'^n —1 dn 


Wo Cl WI C 2 C3 W3 C4 ^^1-2 Ci-iWi-id Wi 


Vo 


bl Vl 1)2 Vm — 1 hm 


Figure 2: Different factorizations in the proof of Lemma[ 6 j In the situation of the hgure it 
holds Co = AoFi-Bo) C\ = = 0 , C 3 = ^ 2 n-Bi, Ci -2 = 

Ci—\ — n Byyi^ and C^ — Ayi n B^n- 

Thus, there exists a factorization w = wqCiWiC 2 ■ ■ ■ Wi-iCiWi, such that the positions of 
Ci are exactly those, that are marked by ai or i.e., Cj = aj or c, = bj for some j. The 
words Wi are over some alphabet C* such that C* = Aj n Bk for some j and k induced 
by the factorizations. In the case of consecutive marked positions, one can set Ci = 0. 
Thus, we obtain a monomial N = CqCiC^C 2 • • • Ci-iCl_-^CiCl with Ci = An D Bm- By 
construction N C Mi, N C M 2 and w £ N holds. Since there are only hnitely many 
monomials of degree k, the size of the number I is bounded. □ 

An analysis of the proof of Lemma E] yields that the bound I < Uk ■ k holds, where 
Uk is the number of distinct /c-monomials over T*. Next, we will show that a language 
which is in V 2 and is a Boolean combination of alphabetic open sets is a hnite Boolean 
combination of monomials. One ingredrient of the proof will be that by Lemma El we 
are able to compress the information of a set of fe-monomials which contain a fixed word 
into the information that a single Lmonomial contains that fixed word. 

Proposition 7. Let L C r°° be a Boolean combination of alphabetic open sets such that 
Synt(L) G V 2 . Then L is a finite Boolean combination of monomials. 

Proof. Let h : T* —Synt(L) be the syntactic homomorphism of L and consider the 
languages h~^{p) for p G Synt(L). By TheoremElwe obtain h~^{p) G BS 2 . Thus, there 
exists a number k such that for every p £ M the language h~^{p) is saturated by =k, 
i.e., u =k V => h(u) = h(v). By Lemma El there exists a number £ such that for every 
set {Ml,..., Mn} of fc-monomials and every w with w G Mi for all i G {1,..., n}, there 
exists a Amonomial N with w & N C fff^iMi. Let a =“ /3 and a L. We show 
fi ^ L which implies L = UQ,eL[a]^ and thus that L is a hnite Boolean combination of 
Amonomials. By observing membership in r*C°°, it is clear that im(a) = im(/3) =: C. 

Let u' < a and v' < fi he prehxes such that for all every Amonomial N = N' ■ C°° 
with a, fi G At we have that some prehx of u', v' is in N'. Further, let tt, v be the shortest 
prehxes of a, /3 such that u' < u, v' < v and for C = (ci,..., Cm} the word (ciC 2 • • • Cm)^ 
is a subword of u" and v” with u = u'u” and v = v'v”, i.e., we extend the words u' and v' 
such that the full imaginary alphabet appears often enough. Let a = ua' and fi = vj3'. 
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P' 


p = 


u' G N' 

3 ^ 

veN' 


Figure 3: Factorization of a and {3 in the proof of Proposition [7] 


We use Theorem [3] and show that for s = h{u) and t = h{v) we have s-h{C*) = t-h{C*), 
which implies /3 € L. By symmetry, it suffices to show t € sh{C*). Consider the set 
of /c-monomials W = which hold at u, i.e., snch that u € and a' € C°°. By 

the choice of i, there exists an ^-monomial N' snch that u & N' and N' C HiN-. Since 
u G N', we obtain a ^ N := N'C°° and by a /3 the membership (3 ^ N holds. By 
construction of v, there exists a prefix v < v' < v snch that v & N' and (3 G C°° with /3 
being defined by (3 = v/3. Let v = vx, then x € C*. We show that ux =k v. 

Thus, let ux G AQaiA\a 2 ■ ■ ■ A*^_^anA^ where the monomial has degree at most k, then 
there exists a factorization j 4 gai^^a 2 ■ ■ ■ A^_^anA'!^ = M 1 M 2 with Mi, M 2 /c-monomials 
snch that u G Mi and x G M 2 . By definition of N' we have u,v £ N' C Mi and thns 
V G Ml. We conclude that v = vx £ M 1 M 2 = ^^ 01^*02 • • • A’!^_.^^anA^. 

Let now v = vx £ ^Qaij 4 *a 2 • • • . Again, there exists a factorization of the 

monomial AgaiA|a2 • • • = Mi M2 with Mi, M2 /c-monomials such that v £ Mi 

and X £ M2. Since (ciC2 • • ■ Cm)^ is a subword of x, there must be an A* in M2 snch 
that C <£ Ai. Thus, there is a factorisation M2 = M21M22 in /c-monomials M2i,M22 
snch that x' £ M21, x” £ M22 for x = x'x" and we have M21 • C* = M21. Consider 
j 3 = vxj 3 ' £ M1M21 ■ C°°. Since a =“ ( 3 , we obtain a £ M1M21 ■ C°°. Thus, there is 
some prefix of u in M1M21 and by M21 • C* = M21, we also obtain ux' £ M1M21. Thus, 
ux = ux' ■ x" £ M1M21 • M22 = M1M2 = AQaiA|a2 • • • holds. We conclude 

ux =k V and thus t = h{v) = h{u)h{x) G sh{C*). □ 

The direct product of homomorphisms 5 : F* —>■ M and /i : F* —>■ is given by 
{g X h) : F* —>■ M X A", re !->■ {g{w), h{w)). It is well-known, that the direct product 
recognizes Boolean combinations: 

Lemma 8. Let L and K be languages such that L recognized by g : A* ^ M and K is 
recognized by h : A* ^ N. Then, any Boolean combination of L and K is recognized by 
{9 X h). 

Proof. Since L n [s][e]‘^ / 0 implies [s][e]^ C L for some linked pair (s,e), we obtain 
L = U {[s][e]^ I [s][e]‘^ D L / 0} for the complement of L. Thus, it suffices to show that 
L U A is recognized by {g x h). Obviously, L is covered by [(s, t)][(e,/)]‘^, where (s,e) 
is a linked pair of M with [s][e]‘^ C L and (f,/) is any linked pair of N. Similiarly one 
can cover K and thus M x N recognizes LVJ K. □ 
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Next, we show that the algebraic characterisation V 2 of BE 2 over finite words also 
holds over finite and inhnite words simultaneously. The proof of this is based on the 
fact that the algebraic part of the characterisation of S 2 over hnite words and finite and 
infinite words is the same [3]. Since every language of S 2 is also a language of BS 2 , and 
thus V 3/2 C V 2 , combining this with Lemma [ 8 ] yields the characterization V 2 . 

Lemma 9. If L C r°° is definable in BS 2 , then Synt(L) € V 2 . 

Proof. By definition, L E BS 2 implies that L is a Boolean combination of language 
L^ E S 2 . By m we have Synt(Li) E V 3/2 and thus Synt(Lj) E V 2 . Since L is a Boolean 
combination of Lj, L is recognized by the direct product of all Synt(Lj) by Lemma [ 8 j In 
particular, Synt(L) is a divisor of the direct product of Synt(Lj) by Lemma [H Hence, 
we obtain Synt(L) E V 2 . □ 

The proof that monomials are definable in S 2 is straightforward. 

Lemma 10. Let L C r°° be a monomial of the form ^^ 01^*02 • • • . Then L 

is definable in S 2 by a formula with quantifier depth at most n + 1 . 

Proof. A formula which describes exactly the elements of the monomial is 

n n—1 

3 x 1 ... 3 xnVy : /\ X{xi) = A l\ Xi < y < Xj+i => \{y) € Ai A 
i=l i=l 

{y > Xn^ A(y) E An) A{y <xi^ \{y) E Aq). 

Hence L is definable in S 2 . □ 

Combining our results we are ready to state and prove the main theorem of the paper. 
Theorem 11. Let L C be ui-regular. Then the following are equivalent: 

1. L is a finite Boolean combination of monomials of the form AQaiA*a 2 • • • 

2. L is definable in BS 2 . 

3. The syntactic homomorphism h of L satiesfies: 

a) Synt(L) E V 2 and 

b) for all linked pairs (s, e), (t, f) it holds that if there exists an alphabet C and 

words e, f with h{e) = e,h{f) = f, alph(e) = alph(/) = C and s ■ h{C*) = 
t ■ h{C*), then [s][e]‘^ C L C L. 

Proof. ‘(U =^> [2]’ : Since BS 2 is closed under Boolean combinations, it suffices to find a 
formula in S 2 for the monomials of the form AQaiA|a 2 • • • A^_^anA^. Hence Lemma [TOl 
completes the proof. 

‘l2]=^[3l’: [3a]is proved by Lemma[9l Since AQaiA^a 2 • • • A*_^an is a set of finite words, 
a monomial AgaiA^a 2 • • • is open in the alphabetic topology by definition. 

The languages in S 2 are unions of such monomials [3| and thus languages in BS 2 are 
Boolean combinations of open sets. This implies [3b] by Theorem [31 

‘[3] ^ [U’ : This is Proposition [71 □ 
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Example 12 . In this example we show that the topological property is necessary. For this 
define L = ({a, h]* aa {a, b}*)‘^. We will show that Synt(L) € V2, but L is not a Boolean 
combinations of open sets of the alphabetic topology. Computing the syntactical monoid 
of L yields Synt(L) = {l,a,b,aa,ab,ba}. The equations 6 ^ = b, xaa = aax = aa and 
bab = b hold in Synt(L). In particular, (a 6 )^ = ab and (aa)^ = aa. Thus, (s, e) = (aa, aa) 
and (t, /) = (aa, ab) are linked pairs. Let h denote the syntactic homomorphism of 
L. Choosing aab as a preimage for aa E Synt(L) yields the alphabetical condition 
alph(aa 6 ) = alph(a 6 ) = C on the idempotents. Since s = t, we trivially have s ■ h{C*) = 
t ■ h{C*). However, [aa][a 6 ]‘^ n L = 0 but [aa\‘^ C L. Thus, L does not satisfy the 
topological condition. It remains to check Synt(L) E V2. It is enough to show that the 
preimages are in BS2. 

• [ 1 ] = 1 • [ 6 ] = 5 + U {b'^ab'^)~^ • [ 5 a] = ( 5 a)’'' 

• [a] = a • [a 5 ] = (a 5 )+ • [aa] = {a, 5 }* aa {a, 5 }* 

One can find BS2 formulas for these languages, e.g., [a 5 ] = L{ip) with 

(f = ( 3 xVy: x < y A X{x) = a) A ( 3 xVy: x > y A X{x) = 5 ) A 
{Vx\/y: X > y V { 3 z : x < z < y) \/ (A(x) 7^ X{y)) 

and thus Synt(L) E V2. 

5 Summary and Open Problems 

The alphabetic topology is an essential ingredient in the study of the fragment S2. Thus, 
in order to study Boolean combinations of S2 formulas, i.e., the fragment BS2 over 
infinite words, we looked closely at properties of Boolean combinations of its open sets. 
It turns out, that it is decidable whether a regular language is a Boolean combination of 
open sets. This does not follow immediately from the decidability of the open sets. We 
used linked pairs of the syntactic homomorphism (which are effectively computable) to 
get decidability of the topological condition. Combining this result with the decidability 
of V2 we obtained an effective characterization of BS2 over r°°, the finite and infinite 
words over the alphabet T. 

In this paper we dealt with BS2, which is the second level of the Straubing-Therien 
hierarchy. Another well-known hierarchy is the dot-depth hierarchy. On the level of logic, 
the difference between the Straubing-Therien hierarchy and the dot-depth hierarchy 
is that formulas for the dot-depth hierarchy may also use the successor predicate. A 
deep result of Straubing is that over finite words each level of the Straubing-Therien 
hierarchy is decidable if and only if it is decidable in the dot-depth hierarchy [I 7 j . Thus, 
the decidability result for BS2 by Place and Zeitoun also yields a decidability result of 
BIl2[<, -l-I]. The fragment S2[<, -t-I] is decidable for cj-regular languages [B]. This result 
also uses topological ideas, namely the factor topology. The open sets in this topology 
describe which factors of a certain length k may appear in the “infinite part” of the 
words. The study of Boolean combinations of open sets in the factor topology is an 
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interesting line of future work, and it may yield a decidability result for BE 2 [<, +1] over 
infinite words. 

Another interesting class of predicates are modular predicates. In [7] the authors have 
studied S 2 [<,MOD] over finite words. The results of [7] can be generalised to infinite 
words by adapting the alphabetic topology to the modular setting. As for successor 
predicates, we believe that an appropriate effective characterization of this topology 
might help in deciding ]BS 2 [<,MOD] over infinite words. To the best of our knowledge 
however, modular predicates have not yet been considered over infinite words. 
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